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AMENDMENTS TO THE CLAIMS: 



1 . (Currently amended) A compute r implemented , as controlled to implement a method of 
increasing efficiency in executing a matrix operation that uses matrix data in a standard 
format, said standard format comprising one of a column major format and a row major 
format, said method comprising: 

for a matrix A data stored in said standard format , wherein said matrix data comprises 
data of any of a complete matrix, a complete submatrix, or a part of a matrix or submatrix, 
separating said matrix A data into blocks of data , each said block having a size p-by-q; and 

at least one of: 

storing elements in at least one of said blocks in at least one of a cache and a 
memory in a format in which elements of said block occupy a location different from an 
original location in said block; and 

storing said blocks of size p by q in said at least one of cache and memory in a 
format in which at least one said block occupies a position different relative to its original 
position in said matrix A 

rearranging and placing in a storage of said computer, for retrieval for executing said 
matrix operation, said blocks of data to be contiguous blocks of contiguous data such that 
said matrix data is represented in a nonstandard format that permits said matrix data to be 
moved from said storage into a position for performing said matrix operation more quickly 
than if said matrix data had been moved as stored in said standard format. 



2. (Currently amended) The method computer of claim 1 claim 22 , i 
wherein said co-processing unit comprises a floating point unit (FPU) and said loading said 
matrix data into said set of data registers comprises loading said blocks from said memory 
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storage into a first series subset of data registers in said set of data registers, so that a format 
of data in said data registers comprises variations of an using a deviation from a normal 
optimal floating point loading instruction of a the_floating point unit (FPU) of the computer. 

3. (Canceled) 

4. (Currently amended) The method computer of claim 1, wherein said size p-by-q comprises 
a 2-by-2 block. 

5. (Currently amended) The method computer of claim 2, wherein said variations of an 
optimal deviation from normal floating point loading comprising comprises a crisscrossing of 
elements about a diagonal of said blocks. 

6. (Currently amended) The method computer of claim 2, wherein said matrix operation 
comprises a linear algebra operation, said method further comprising: 

selectively, at least one of loading input data and storing a result of said linear algebra 
matrix operation into or out of said co-processing unit from into one of a second set of data 
registers and a cache memory' unit LI cache or memory by at least one of a subset of optimal 
load and store instructions , said loading and storing a result being dictated by an optimal FPU 
loading or storage instruction. 

7. (Currently amended) The method computer of claim 2, wherein said variations deviation 
of an said normal optimal floating point loading instruction, in combination with said storing 
said blocks in a different position nonstandard format, provides a result that data of a transpose 
of said matrix A resides data to reside in said data registers of said FPU. 
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8. (Currently amended) The method computer of claim 2, wherein said loading comprises a 
checkerboard 2x2 crisscrossing technique. 

9. (Currently amended) The method computer of claim 6, wherein said linear algebra 
operation comprises an LAPACK one of a BLAS kernel and a factorization kernel . 

10-16. (Canceled) 

17. (Currently amended) A signal-bearing medium tangibly embodying a program of 
machine-readable instructions executable by a digital processing apparatus to perform a 
method of storing information of a matrix in a register block data format, said method 
comprising: 

receiving data for a matrix A , said data comprising one of a complete matrix data, a 
complete submatrix data, and a partial matrix or submatrix data, said matrix data being stored 
in one of a standard column format and a standard row format ; 

dividing said matrix A data into blocks, each said block having a size p-by-q; and 

at least one of: 

storing elements in at least one of said blocks in at least one of a cache and a 
memory in a format in which is elements of said block occupy a location different from an 
original location in said block 

storing said blocks of size p-by-q in a memory in a format in which at least 
one said block occupies a position different from its original position in said matrix A± 

said register data block format converting the matrix data to no longer be in either of 
said standard column format or said standard row format . 
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18. (Currently amended) The signal-bearing medium of claim 17, said method further 
comprising: 

loading said blocks from said memory into a plurality of data registers so that a 
format of data in said data registers comprises a transpose data of said matrix A. 

19. (Currently amended) The signal-bearing medium of claim 18, wherein said loading 
comprises a loading using a checkerboard 2x2 crisscrossing technique. 

20. (Canceled) 

21. (New) The computer of claim 1, wherein said matrix operation is executed on a co- 
processing unit of said computer and said position for performing said matrix operation 
comprises a set of data registers of said co-processing unit, said method further comprising: 

retrieving said matrix data from said storage in said nonstandard format; and 
loading said matrix data into at least a subset of said set of data registers in an optimal 
format, said optimal format comprising a format of said matrix data in said data registers such 
that a minimal possible time is required to utilize said matrix data in said data registers in said 
matrix operation in said co-processing unit. 

22. (New) The computer of claim 21, wherein said computer includes at least one of a 
machine architecture and an instruction set having one or more features that are less than 
optimal for executing said matrix operation, and said nonstandard format of matrix data and 
said optimal format in said data registers together provide a mechanism that overcomes said 
one or more features that are less than optimal for executing said matrix operation. 
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23. (New) A computer configured to implement a method of increasing efficiency in 
executing a matrix operation that uses matrix data in a standard format, said standard format 
comprising one of a column major format and a row major format, said method comprising: 

converting at least a part of said matrix data into a pseudo matrix format comprising 
contiguous data that no longer represents said matrix data in said standard format, each 
pseudo matrix comprising a subset of said matrix data that is predetermined to permit a 
loading of said pseudo matrix data into a processing unit in an optimal format to perform said 
matrix operation, said optimal format comprising a format that allows a minimal possible 
time in said processing unit to utilize said matrix data in said matrix operation. 

24. (New) The computer of claim 23, said method further comprising successively loading 
elements of each said pseudo matrix into said processing unit for executing said matrix 
operation, wherein said loading comprises successively placing data of each said pseudo 
matrix into predetermined registers of a register set of said processor in said optimal format. 

25. (New) The computer of claim 24, said method further comprising: 

processing said matrix operation on said data in said optimal format, a result of said 
processing being stored in predetermined registers of said register set; and 

storing said result from said predetermined registers of said register set into memory 
in said pseudo matrix format. 

26. (New) A computer having at least one of a machine architecture and an instruction set 
having one or more features that are less than optimal for executing a matrix operation, said 
computer configured to implement a method of overcoming said disadvantage, said method 
comprising: 
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rearranging at least a part of matrix data to be used in said matrix operation into a 
plurality of blocks, each block having size p-by-q, such that said matrix data is no longer 
stored in a standard matrix format comprising one of a row major format and a column major 
format, said rearranged matrix data in said blocks being stored as contiguous blocks of 
contiguous data in a nonstandard format, 

wherein said nonstandard format of said matrix data is predetermined to allow said 
matrix data to be placed into a processing unit for processing said matrix data in said matrix 
operation such that said disadvantage on said computer is overcome. 

27. (New) The computer of claim 26, said method further comprising: 

loading said matrix data in said nonstandard format into at least a subset of data 
registers of said processing unit in an optimal format, said optimal format comprising a 
format allowing a minimal possible time in said processing unit to utilize said matrix data in 
said matrix operation. 

28. (New) A computer configured to implement a method of overcoming a hardware 
disadvantage on said computer relative to a specific processing on a specific computer 
architecture/set of instructions, said method comprising: 

using first software instructions to preliminarily process input data to be used in said 
specific processing on said specific computer architecture/set of instructions in a manner to 
generate a first error relative to said specific processing; and 

using second software instructions to subsequently process said input data in a manner 
to generate a correcting error relative to said specific processing, 

wherein first software instructions in combination with said second software 
instructions overcome said disadvantage. 
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29. (New) The computer of claim 30, wherein said specific processing comprises a matrix 
operation and said first error comprises storing matrix data in a format that converts matrix 
data from a standard column major or row major format into a nonstandard format 
predetermined to overcome said disadvantage when said data is subjected to said correcting 
error. 



